CFQI: Fitted Q-Iteration with Complex Returns

نویسندگان

  • Robert William Wright
  • Xingye Qiao
  • Lei Yu
  • Steven Loscalzo
چکیده

Fitted Q-Iteration (FQI) is a popular approximate value iteration (AVI) approach that makes effective use of off-policy data. FQI uses a 1-step return value update which does not exploit the sequential nature of trajectory data. Complex returns (weighted averages of the n-step returns) use trajectory data more effectively, but have not been used in an AVI context because of off-policy bias. In this paper we propose a new generalization of FQI called Complex Fitted Q-Iteration (CFQI) which allows for complex returns. Theoretical properties are proved that show CFQI does not break existing convergence properties. Two methods for integrating complex returns are presented. The first method uses a simple truncating procedure for reducing off-policy bias. Our second method applies a novel bounding operation that utilizes the off-policy bias. We provide an empirical evaluation of the proposed methods on several reinforcement learning benchmarks. The results demonstrate that our methods significantly improve over FQI in terms of value estimation accuracy, policy performance, and convergence speed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Reinforcement Learning with Regularized Convolutional Neural Fitted Q Iteration

We review the deep reinforcement learning setting, in which an agent receiving high-dimensional input from an environment learns a control policy without supervision using multilayer neural networks. We then extend the Neural Fitted Q Iteration value-based reinforcement learning algorithm (Riedmiller et al) by introducing a novel variation which we call Regularized Convolutional Neural Fitted Q...

متن کامل

Bias Correction and Confidence Intervals for Fitted Q-iteration

We consider finite-horizon fitted Q-iteration with linear function approximation to learn a policy from a training set of trajectories. We show that fitted Q-iteration can give biased estimates and invalid confidence intervals for the parameters that feature in the policy. We propose a regularized estimator called soft-threshold estimator, derive it as an approximate empirical Bayes estimator, ...

متن کامل

Learning to Play the Worker-Placement Game Euphoria using Neural Fitted Q Iteration

We design and implement an agent for the popular worker placement and resource management game Euphoria using Neural Fitted Q Iteration (NFQ), a reinforcement learning algorithm that uses an artificial neural network for the action-value function which is updated off-line considering a sequence of training experiences rather than online as in typical Q-learning. We find that the agent is able t...

متن کامل

Optimizing Spoken Dialogue Management from Data Corpora with Fitted Value Iteration

In recent years machine learning approaches have been proposed for dialogue management optimization in spoken dialogue systems. It is customary to cast the dialogue management problem into a Markov Decision Process (MDP) and to find the associated optimal policy using Reinforcement Learning (RL) algorithms. Yet, the dialogue state space is usually very large (even infinite) and standard RL algo...

متن کامل

Optimizing spoken dialogue management with fitted value iteration

In recent years machine learning approaches have been proposed for dialogue management optimization in spoken dialogue systems. It is customary to cast the dialogue management problem into a Markov Decision Process (MDP) and to find the associated optimal policy using Reinforcement Learning (RL) algorithms. Yet, the dialogue state space is usually very large (even infinite) and standard RL algo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015